Sense-Aaware Semantic Analysis: A Multi-Prototype Word Representation Model Using Wikipedia
نویسندگان
چکیده
Human languages are naturally ambiguous, which makes it difficult to automatically understand the semantics of text. Most vector space models (VSM) treat all occurrences of a word as the same and build a single vector to represent the meaning of a word, which fails to capture any ambiguity. We present sense-aware semantic analysis (SaSA), a multi-prototype VSM for word representation based on Wikipedia, which could account for homonymy and polysemy. The “sense-specific” prototypes of a word are produced by clustering Wikipedia pages based on both local and global contexts of the word in Wikipedia. Experimental evaluation on semantic relatedness for both isolated words and words in sentential contexts and word sense induction demonstrate its effectiveness.
منابع مشابه
Sense-aware Semantic Analysis: A Multi-prototype Word Representation Model using Wikipedia
Human languages are naturally ambiguous, which makes it difficult to automatically understand the semantics of text. Most vector space models (VSM) treat all occurrences of a word as the same and build a single vector to represent the meaning of a word, which fails to capture any ambiguity. We present sense-aware semantic analysis (SaSA), a multi-prototype VSM for word representation based on W...
متن کاملNamed Entity Recognition in Persian Text using Deep Learning
Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...
متن کاملSemantic Representations of Word Senses and Concepts
Representing the semantics of linguistic items in a machine-interpretable form has been a major goal of Natural Language Processing since its earliest days. Among the range of different linguistic items, words have attracted the most research attention. However, word representations have an important limitation: they conflate different meanings of a word into a single vector. Representations of...
متن کاملWikipedia-based Compact Hierarchical Semantics for Natural Language Processing
A correct semantic representation of words and texts underlies many text processing tasks such as text categorization, word sense disambiguation, and semantic relatedness assessment. It has long been recognized that computers require access to common-sense and domain-specific world knowledge in order to process textual data at a deeper level. In this paper, we present a novel representation of ...
متن کاملOn Modeling Sense Relatedness in Multi-prototype Word Embedding
To enhance the expression ability of distributional word representation learning model, many researchers tend to induce word senses through clustering, and learn multiple embedding vectors for each word, namely multi-prototype word embedding model. However, most related work ignores the relatedness among word senses which actually plays an important role. In this paper, we propose a novel appro...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015